{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8e5ea381-9420-4ce0-8dea-7bbb4f3f3daa",
   "metadata": {},
   "source": [
    "# Hello NPU\n",
    "\n",
    "## Working with NPU in OpenVINO™\n",
    "\n",
    "#### Table of contents:\n",
    "- [Introduction](#Introduction)\n",
    "    - [Install required packages](#Install-required-packages)\n",
    "- [Checking NPU with Query Device](#Checking-NPU-with-Query-Device)\n",
    "    - [List the NPU with core.available_devices](#List-the-NPU-with-core.available_devices)\n",
    "    - [Check Properties with core.get_property](#Check-Properties-with-core.get_property)\n",
    "    - [Brief Descriptions of Key Properties](#Brief-Descriptions-of-Key-Properties)\n",
    "- [Compiling a Model on NPU](#Compiling-a-Model-on-NPU)\n",
    "    - [Download a Model](#Download-and-Convert-a-Model)\n",
    "    - [Compile with Default Configuration](#Compile-with-Default-Configuration)\n",
    "    - [Reduce Compile Time through Model Caching](#Reduce-Compile-Time-through-Model-Caching)\n",
    "        - [UMD Model Caching](#UMD-Model-Caching)\n",
    "        - [OpenVINO Model Caching](#OpenVINO-Model-Caching)\n",
    "    - [Throughput and Latency Performance Hints](#Throughput-and-Latency-Performance-Hints)\n",
    "- [Performance Comparison with benchmark_app](#Performance-Comparison-with-benchmark_app)\n",
    "    - [NPU vs CPU with Latency Hint](#NPU-vs-CPU-with-Latency-Hint)\n",
    "        - [Effects of UMD Model Caching](#Effects-of-UMD-Model-Caching)\n",
    "    - [NPU vs CPU with Throughput Hint](#NPU-vs-CPU-with-Throughput-Hint)\n",
    "- [Limitations](#Limitations)\n",
    "- [Conclusion](#Conclusion)\n",
    "### Installation Instructions\n",
    "\n",
    "This is a self-contained example that relies solely on its own code.\n",
    "\n",
    "We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.\n",
    "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n",
    "\n",
    "<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/hello-npu/hello-npu.ipynb\" />\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6a83166-b033-4b20-b223-5eec6b84f46b",
   "metadata": {},
   "source": [
    "This tutorial provides a high-level overview of working with the NPU device **Intel(R) AI Boost** (introduced with the Intel® Core™ Ultra generation of CPUs) in OpenVINO. It explains some of the key properties of the NPU and shows how to compile a model on NPU with performance hints.\n",
    "\n",
    "This tutorial also shows example commands for benchmark_app that can be run to compare NPU performance with CPU in different configurations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45805698-72df-4197-b6e5-d57fdc3a365e",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdfc9e1f-f463-4ffe-820b-c924857fb066",
   "metadata": {},
   "source": [
    "The Neural Processing Unit (NPU) is a low power hardware solution which enables you to offload certain neural network computation tasks from other devices, for more streamlined resource management.\n",
    "\n",
    "Note that the NPU plugin is included in PIP installation of OpenVINO™ and you need to [install a proper NPU driver](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-npu.html) to use it successfully.\n",
    "\n",
    "**Supported Platforms**:  \n",
    "    Host: Intel® Core™ Ultra  \n",
    "    NPU device: NPU 3720  \n",
    "    OS: Ubuntu 22.04 (with Linux Kernel 6.6+), MS Windows 11 (both 64-bit)\n",
    "\n",
    "To learn more about the NPU Device, see the [page](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html).\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d568b8f-5f53-42f6-82b9-ee29412b4f96",
   "metadata": {},
   "source": [
    "### Install required packages\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69c5b00e-0ec4-4c57-b9c3-1a25e2c264c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -q \"openvino>=2024.1.0\" huggingface_hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ea02840",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from pathlib import Path\n",
    "\n",
    "if not Path(\"notebook_utils.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n",
    "    )\n",
    "    open(\"notebook_utils.py\", \"w\").write(r.text)\n",
    "\n",
    "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
    "from notebook_utils import collect_telemetry\n",
    "\n",
    "collect_telemetry(\"hello-npu.ipynb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e19f77db-f27a-493c-b389-af520eac5422",
   "metadata": {},
   "source": [
    "## Checking NPU with Query Device\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c632f40b-1a59-4c9e-b8e0-6eec3ca81703",
   "metadata": {},
   "source": [
    "In this section, we will see how to list the available NPU and check its properties. Some of the key properties will be defined."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "731e32fb-9a74-4e76-8f77-62f16b0d49a8",
   "metadata": {},
   "source": [
    "### List the NPU with core.available_devices\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84eab2e7-36f6-45f5-84d5-7a946afc7534",
   "metadata": {},
   "source": [
    "OpenVINO Runtime provides the ```available_devices``` method for checking which devices are available for inference. The following code will output a list a compatible OpenVINO devices, in which Intel NPU should appear (ensure that the driver is installed successfully). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2f930769-b305-4f68-8cce-a86258f80af2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['CPU', 'GPU', 'NPU']"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import openvino as ov\n",
    "\n",
    "core = ov.Core()\n",
    "core.available_devices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87cb433f-7c7e-4e0f-b0b3-31be4eabc8d1",
   "metadata": {},
   "source": [
    "### Check Properties with core.get_property\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b97c7f8-f697-4df6-8bfd-ec2fbfb8273d",
   "metadata": {},
   "source": [
    "To get information about the NPU, we can use device properties. In OpenVINO, devices have properties that describe their characteristics and configurations. Each property has a name and associated value that can be queried with the ```get_property``` method."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c60fa7b1-0b43-4a92-ac6f-d4e9d96c88fd",
   "metadata": {},
   "source": [
    "To get the value of a property, such as the device name, we can use the ```get_property``` method as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "861af905-a574-4c74-a8a9-e7735c9df43a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Intel(R) AI Boost'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import openvino.properties as props\n",
    "\n",
    "\n",
    "device = \"NPU\"\n",
    "\n",
    "core.get_property(device, props.device.full_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70889c34-74f8-4a7a-b23d-166311c7c02d",
   "metadata": {},
   "source": [
    "Each device also has a specific property called ```SUPPORTED_PROPERTIES```, that enables viewing all the available properties in the device. We can check the value for each property by simply looping through the dictionary returned by ```core.get_property(\"NPU\", props.supported_properties)``` and then querying for that property."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3560b0f4-e105-4004-aefe-d685667925c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"{device} SUPPORTED_PROPERTIES:\\n\")\n",
    "supported_properties = core.get_property(device, props.supported_properties)\n",
    "indent = len(max(supported_properties, key=len))\n",
    "\n",
    "for property_key in supported_properties:\n",
    "    if property_key not in (\"SUPPORTED_METRICS\", \"SUPPORTED_CONFIG_KEYS\", \"SUPPORTED_PROPERTIES\"):\n",
    "        try:\n",
    "            property_val = core.get_property(device, property_key)\n",
    "        except TypeError:\n",
    "            property_val = \"UNSUPPORTED TYPE\"\n",
    "        print(f\"{property_key:<{indent}}: {property_val}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "890be759-744a-49f1-9261-81feee9e09c4",
   "metadata": {},
   "source": [
    "### Brief Descriptions of Key Properties\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a5f17fe-1f06-4026-9668-196853d55278",
   "metadata": {},
   "source": [
    "Each device has several properties as seen in the last command. Some of the key properties are:\n",
    "- `FULL_DEVICE_NAME` - The product name of the NPU.\n",
    "- `PERFORMANCE_HINT` - A high-level way to tune the device for a specific performance metric, such as latency or throughput, without worrying about device-specific settings.\n",
    "- `CACHE_DIR` - The directory where the OpenVINO model cache data is stored to speed up the compilation time.\n",
    "- `OPTIMIZATION_CAPABILITIES` - The model data types (INT8, FP16, FP32, etc) that are supported by this NPU.\n",
    "\n",
    "To learn more about devices and properties, see the [Query Device Properties](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/query-device-properties.html) page."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f95f2fa-d804-480e-831e-b118f637e646",
   "metadata": {},
   "source": [
    "## Compiling a Model on NPU\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77317cda-9e17-45bc-8122-af6f4260f3ce",
   "metadata": {},
   "source": [
    "Now, we know the NPU present in the system and we have checked its properties. We can easily use it for compiling and running models with OpenVINO NPU plugin."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ac8055c-b45c-478d-bc75-ec68b8b17ff2",
   "metadata": {},
   "source": [
    "## Download a Model\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6df3a6aa-f35f-482f-bb12-1d2bee2575fd",
   "metadata": {},
   "source": [
    "This tutorial uses the `resnet50` model. The `resnet50` model is used for image classification tasks. The model was trained on [ImageNet](https://www.image-net.org/index.php) dataset which contains over a million images categorized into 1000 classes. To read more about resnet50, see the [paper](https://ieeexplore.ieee.org/document/7780459).\n",
    "As our tutorial focused on inference part, we skip model conversion step. To convert this Pytorch model to OpenVINO IR, [Model Conversion API](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html) should be used. Please check this [tutorial](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/pytorch-to-openvino/pytorch-to-openvino.ipynb) for details how to convert pytorch model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f6dd041c-6927-4f3a-a779-21cb90565811",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "# create a directory for resnet model file\n",
    "MODEL_DIRECTORY_PATH = Path(\"model\")\n",
    "MODEL_DIRECTORY_PATH.mkdir(exist_ok=True)\n",
    "\n",
    "model_name = \"resnet50\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "50840dd5-4792-45b1-bbea-a66ac581b9b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import huggingface_hub as hf_hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5cea2106-3274-4b3f-9fad-882f27dd9a0b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Read IR model from model\\ir_model\\resnet50_fp16.xml\n"
     ]
    }
   ],
   "source": [
    "precision = \"FP16\"\n",
    "\n",
    "model_path = MODEL_DIRECTORY_PATH / \"ir_model\" / f\"{model_name}_{precision.lower()}.xml\"\n",
    "\n",
    "model = None\n",
    "if not model_path.exists():\n",
    "    hf_hub.snapshot_download(\"katuni4ka/resnet50_fp16\", local_dir=model_path.parent)\n",
    "    print(\"IR model saved to {}\".format(model_path))\n",
    "    model = core.read_model(model_path)\n",
    "else:\n",
    "    print(\"Read IR model from {}\".format(model_path))\n",
    "    model = core.read_model(model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a3af027-41bf-4281-845e-41b445e99f3d",
   "metadata": {},
   "source": [
    "**Note:** NPU also supports `INT8` quantized models."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c665ad4f-2c2c-4d1a-b429-92279c5b6629",
   "metadata": {},
   "source": [
    "### Compile with Default Configuration\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bbbdd98-ffd3-4568-b6b3-eb1ce8528c39",
   "metadata": {},
   "source": [
    "When the model is ready, first we need to read it, using the `read_model` method. Then, we can use the `compile_model` method and specify the name of the device we want to compile the model on, in this case, \"NPU\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "95d7a069-60d7-40e0-8dd9-82f6727cc9f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "compiled_model = core.compile_model(model, device)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4ff6268-618b-44bf-8af3-70b291c79dca",
   "metadata": {},
   "source": [
    "### Reduce Compile Time through Model Caching\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cf2ae0e-d652-4e1e-b461-d382eee0257e",
   "metadata": {},
   "source": [
    "Depending on the model used, device-specific optimizations and network compilations can cause the compile step to be time-consuming, especially with larger models, which may lead to bad user experience in the application. To solve this **Model Caching** can be used."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3388175a-9f43-412e-be84-6a2a3dad9799",
   "metadata": {},
   "source": [
    "Model Caching helps reduce application startup delays by exporting and reusing the compiled model automatically. The following two compilation-related metrics are crucial in this area:\n",
    "\n",
    "- **First-Ever Inference Latency (FEIL)**:  \n",
    "  Measures all steps  required to compile and execute a model on the device for the first time. It includes model compilation time, the time required to load and initialize the model on the device and the first inference execution.\n",
    "- **First Inference Latency (FIL)**:  \n",
    "  Measures the time required to load and initialize the pre-compiled model on the device and the first inference execution."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "661175bc-6497-4329-89c9-c18df52d5b21",
   "metadata": {},
   "source": [
    "In NPU, UMD model caching is a solution enabled by default by the driver. It improves time to first inference (FIL) by storing the model in the cache after compilation (included in FEIL). Learn more about UMD Caching [here](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#umd-dynamic-model-caching). Due to this caching, it takes lesser time to load the model after first compilation.\n",
    "\n",
    "You can also use OpenVINO Model Caching, which is a common mechanism for all OpenVINO device plugins and can be enabled by setting the `cache_dir` property.  \n",
    "By enabling OpenVINO Model Caching, the UMD caching is automatically bypassed by the NPU plugin, which means the model will only be stored in the OpenVINO cache after compilation. When a cache hit occurs for subsequent compilation requests, the plugin will import the model instead of recompiling it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f21103bc-82ff-4f8a-8ccc-bb8a30219e0f",
   "metadata": {},
   "source": [
    "#### UMD Model Caching\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad548de7-b591-454e-b049-7ce8c50a2975",
   "metadata": {},
   "source": [
    "To see how UMD caching see the following example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "8719d2ec-2866-4f3b-ba62-098f9190ce6c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "UMD Caching (first time) - compile time: 3.2854952812194824s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "from pathlib import Path\n",
    "\n",
    "start = time.time()\n",
    "core = ov.Core()\n",
    "\n",
    "# Compile the model as before\n",
    "model = core.read_model(model=model_path)\n",
    "compiled_model = core.compile_model(model, device)\n",
    "print(f\"UMD Caching (first time) - compile time: {time.time() - start}s\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "3cdfeaa5-8f2e-4e94-b45f-dd23b14e0408",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "UMD Caching - compile time: 2.269814968109131s\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "core = ov.Core()\n",
    "\n",
    "# Compile the model once again to see UMD Caching\n",
    "model = core.read_model(model=model_path)\n",
    "compiled_model = core.compile_model(model, device)\n",
    "print(f\"UMD Caching - compile time: {time.time() - start}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4be65ac-e2fc-4ac1-ab73-9f72025992a5",
   "metadata": {},
   "source": [
    "#### OpenVINO Model Caching\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "335fa1fa-5acf-4ee0-94d7-0f56c5546aed",
   "metadata": {},
   "source": [
    "To get an idea of OpenVINO model caching, we can use the OpenVINO cache as follow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "1e5683f1-514d-4c72-b4ae-683849ca7be9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cache enabled (first time) - compile time: 0.6362860202789307s\n",
      "Cache enabled (second time) - compile time: 0.3032548427581787s\n"
     ]
    }
   ],
   "source": [
    "# Create cache folder\n",
    "cache_folder = Path(\"cache\")\n",
    "cache_folder.mkdir(exist_ok=True)\n",
    "\n",
    "start = time.time()\n",
    "core = ov.Core()\n",
    "\n",
    "# Set cache folder\n",
    "core.set_property({props.cache_dir(): cache_folder})\n",
    "\n",
    "# Compile the model\n",
    "model = core.read_model(model=model_path)\n",
    "compiled_model = core.compile_model(model, device)\n",
    "print(f\"Cache enabled (first time) - compile time: {time.time() - start}s\")\n",
    "\n",
    "start = time.time()\n",
    "core = ov.Core()\n",
    "\n",
    "# Set cache folder\n",
    "core.set_property({props.cache_dir(): cache_folder})\n",
    "\n",
    "# Compile the model as before\n",
    "model = core.read_model(model=model_path)\n",
    "compiled_model = core.compile_model(model, device)\n",
    "print(f\"Cache enabled (second time) - compile time: {time.time() - start}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f05a870a-686a-44f2-9b2f-1e4f9ccf2447",
   "metadata": {},
   "source": [
    "And when the OpenVINO cache is disabled:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "39f336b6-0237-4cec-b8af-fbf371351cce",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cache disabled - compile time: 3.0127954483032227s\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "core = ov.Core()\n",
    "model = core.read_model(model=model_path)\n",
    "compiled_model = core.compile_model(model, device)\n",
    "print(f\"Cache disabled - compile time: {time.time() - start}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b55b71b-f6f6-4619-aed1-d216dc83ab9c",
   "metadata": {},
   "source": [
    "The actual time improvements will depend on the environment as well as the model being used but it is definitely something to consider when optimizing an application. To read more about this, see the [Model Caching docs](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9240250b-15ac-46b4-b1ba-055f04a7ed15",
   "metadata": {},
   "source": [
    "### Throughput and Latency Performance Hints\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1610f4ba-748c-44d1-acae-52f046cc7697",
   "metadata": {},
   "source": [
    "To simplify device and pipeline configuration, OpenVINO provides high-level performance hints that automatically set the batch size and number of parallel threads for inference. The \"LATENCY\" performance hint optimizes for fast inference times while the \"THROUGHPUT\" performance hint optimizes for high overall bandwidth or FPS."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cccd1b5-4d5a-41f3-8d8a-4ee0bc235a9e",
   "metadata": {},
   "source": [
    "To use the \"LATENCY\" performance hint, add `{hints.performance_mode(): hints.PerformanceMode.LATENCY}` when compiling the model as shown below. For NPU, this automatically minimizes the batch size and number of parallel streams such that all of the compute resources can focus on completing a single inference as fast as possible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "0773486e-8294-455c-8d13-c803c4c68961",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openvino.properties.hint as hints\n",
    "\n",
    "\n",
    "compiled_model = core.compile_model(model, device, {hints.performance_mode(): hints.PerformanceMode.LATENCY})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ca1f3d8-202c-4a98-85bc-b66110120dfb",
   "metadata": {},
   "source": [
    "To use the \"THROUGHPUT\" performance hint, add `{hints.performance_mode(): hints.PerformanceMode.THROUGHPUT}` when compiling the model. For NPUs, this creates multiple processing streams to efficiently utilize all the execution cores and optimizes the batch size to fill the available memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "162dc563-c80e-4e4b-ad77-20d80715a19c",
   "metadata": {},
   "outputs": [],
   "source": [
    "compiled_model = core.compile_model(model, device, {hints.performance_mode(): hints.PerformanceMode.THROUGHPUT})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0328676a-f569-4a4d-bda0-3fb47b97a420",
   "metadata": {},
   "source": [
    "## Performance Comparison with benchmark_app\n",
    "[back to top ⬆️](#Table-of-contents:)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41396503-d589-4b98-be91-e2277d6c7808",
   "metadata": {},
   "source": [
    "Given all the different options available when compiling a model, it may be difficult to know which settings work best for a certain application. Thankfully, OpenVINO provides `benchmark_app` - a performance benchmarking tool."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "864304b8-e129-4d14-adcc-efe3ad350456",
   "metadata": {},
   "source": [
    "The basic syntax of `benchmark_app` is as follows:\n",
    "\n",
    "`\n",
    "benchmark_app -m PATH_TO_MODEL -d TARGET_DEVICE -hint {throughput,cumulative_throughput,latency,none}\n",
    "`\n",
    "\n",
    "where `TARGET_DEVICE` is any device shown by the `available_devices` method as well as the MULTI and AUTO devices we saw previously, and the value of hint should be one of the values between brackets.\n",
    "\n",
    "Note that benchmark_app only requires the model path to run but both device and hint arguments will be useful to us. For more advanced usages, the tool itself has other options that can be checked by running `benchmark_app -h` or reading the [docs](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/benchmark-tool.html). The following example shows us to benchmark a simple model, using a NPU with latency focus:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7915a0b1-f0a9-41b5-b38b-3cbf97a69a7d",
   "metadata": {},
   "source": [
    "`benchmark_app -m {model_path} -d NPU -hint latency`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86af2b7b-6723-403a-80f9-0ee59c9429b2",
   "metadata": {},
   "source": [
    "For completeness, let us list here some of the comparisons we may want to do by varying the device and hint used. Note that the actual performance may depend on the hardware used. Generally, we should expect NPU to be better than CPU.  \n",
    "Please refer to the `benchmark_app` log entries under `[Step 11/11] Dumping statistics report` to observe the differences in latency and throughput between the CPU and NPU.."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de8a7194-60d9-4b74-ac8b-f663241c60f4",
   "metadata": {},
   "source": [
    "#### NPU vs CPU with Latency Hint\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "183bc1d4-b0a6-4b3a-ae8d-7fcb21d596ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Step 1/11] Parsing and validating input arguments\n",
      "[ INFO ] Parsing input parameters\n",
      "[Step 2/11] Loading OpenVINO Runtime\n",
      "[ INFO ] OpenVINO:\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] Device info:\n",
      "[ INFO ] CPU\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] \n",
      "[Step 3/11] Setting device configuration\n",
      "[Step 4/11] Reading model files\n",
      "[ INFO ] Loading model files\n",
      "[ INFO ] Read model took 14.00 ms\n",
      "[ INFO ] Original model I/O parameters:\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : f32 / [...] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 5/11] Resizing model to match image sizes and given batch\n",
      "[ INFO ] Model batch size: 1\n",
      "[Step 6/11] Configuring input of the model\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : u8 / [N,C,H,W] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 7/11] Loading the model to the device\n",
      "[ INFO ] Compile model took 143.22 ms\n",
      "[Step 8/11] Querying optimal runtime parameters\n",
      "[ INFO ] Model:\n",
      "[ INFO ]   NETWORK_NAME: Model2\n",
      "[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1\n",
      "[ INFO ]   NUM_STREAMS: 1\n",
      "[ INFO ]   AFFINITY: Affinity.HYBRID_AWARE\n",
      "[ INFO ]   INFERENCE_NUM_THREADS: 12\n",
      "[ INFO ]   PERF_COUNT: NO\n",
      "[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>\n",
      "[ INFO ]   PERFORMANCE_HINT: LATENCY\n",
      "[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE\n",
      "[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0\n",
      "[ INFO ]   ENABLE_CPU_PINNING: False\n",
      "[ INFO ]   SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE\n",
      "[ INFO ]   MODEL_DISTRIBUTION_POLICY: set()\n",
      "[ INFO ]   ENABLE_HYPER_THREADING: False\n",
      "[ INFO ]   EXECUTION_DEVICES: ['CPU']\n",
      "[ INFO ]   CPU_DENORMALS_OPTIMIZATION: False\n",
      "[ INFO ]   LOG_LEVEL: Level.NO\n",
      "[ INFO ]   CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0\n",
      "[ INFO ]   DYNAMIC_QUANTIZATION_GROUP_SIZE: 0\n",
      "[ INFO ]   KV_CACHE_PRECISION: <Type: 'float16'>\n",
      "[Step 9/11] Creating infer requests and preparing input tensors\n",
      "[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!\n",
      "[ INFO ] Fill input 'x' with random values \n",
      "[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)\n",
      "[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n",
      "[ INFO ] First inference took 28.95 ms\n",
      "[Step 11/11] Dumping statistics report\n",
      "[ INFO ] Execution Devices:['CPU']\n",
      "[ INFO ] Count:            1612 iterations\n",
      "[ INFO ] Duration:         60039.72 ms\n",
      "[ INFO ] Latency:\n",
      "[ INFO ]    Median:        39.99 ms\n",
      "[ INFO ]    Average:       37.13 ms\n",
      "[ INFO ]    Min:           19.13 ms\n",
      "[ INFO ]    Max:           71.94 ms\n",
      "[ INFO ] Throughput:   26.85 FPS\n"
     ]
    }
   ],
   "source": [
    "!benchmark_app -m {model_path} -d CPU -hint latency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "d5aa1702-9118-4c62-bba3-ca6ca1cf0961",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Step 1/11] Parsing and validating input arguments\n",
      "[ INFO ] Parsing input parameters\n",
      "[Step 2/11] Loading OpenVINO Runtime\n",
      "[ INFO ] OpenVINO:\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] Device info:\n",
      "[ INFO ] NPU\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] \n",
      "[Step 3/11] Setting device configuration\n",
      "[Step 4/11] Reading model files\n",
      "[ INFO ] Loading model files\n",
      "[ INFO ] Read model took 11.51 ms\n",
      "[ INFO ] Original model I/O parameters:\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : f32 / [...] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 5/11] Resizing model to match image sizes and given batch\n",
      "[ INFO ] Model batch size: 1\n",
      "[Step 6/11] Configuring input of the model\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : u8 / [N,C,H,W] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 7/11] Loading the model to the device\n",
      "[ INFO ] Compile model took 2302.40 ms\n",
      "[Step 8/11] Querying optimal runtime parameters\n",
      "[ INFO ] Model:\n",
      "[ INFO ]   DEVICE_ID: \n",
      "[ INFO ]   ENABLE_CPU_PINNING: False\n",
      "[ INFO ]   EXECUTION_DEVICES: NPU.3720\n",
      "[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float16'>\n",
      "[ INFO ]   INTERNAL_SUPPORTED_PROPERTIES: {'CACHING_PROPERTIES': 'RO'}\n",
      "[ INFO ]   LOADED_FROM_CACHE: False\n",
      "[ INFO ]   NETWORK_NAME: \n",
      "[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1\n",
      "[ INFO ]   PERFORMANCE_HINT: PerformanceMode.LATENCY\n",
      "[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1\n",
      "[ INFO ]   PERF_COUNT: False\n",
      "[Step 9/11] Creating infer requests and preparing input tensors\n",
      "[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!\n",
      "[ INFO ] Fill input 'x' with random values \n",
      "[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)\n",
      "[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n",
      "[ INFO ] First inference took 7.94 ms\n",
      "[Step 11/11] Dumping statistics report\n",
      "[ INFO ] Execution Devices:NPU.3720\n",
      "[ INFO ] Count:            17908 iterations\n",
      "[ INFO ] Duration:         60004.49 ms\n",
      "[ INFO ] Latency:\n",
      "[ INFO ]    Median:        3.29 ms\n",
      "[ INFO ]    Average:       3.33 ms\n",
      "[ INFO ]    Min:           3.21 ms\n",
      "[ INFO ]    Max:           6.90 ms\n",
      "[ INFO ] Throughput:   298.44 FPS\n"
     ]
    }
   ],
   "source": [
    "!benchmark_app -m {model_path} -d NPU -hint latency"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f9714ec-dd99-4aa8-bfc4-9274df57d7fb",
   "metadata": {},
   "source": [
    "##### Effects of UMD Model Caching\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "To see the effects of UMD Model caching, we are going to run the benchmark_app and see the difference in model read time and compilation time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "8c5c1982-909d-42b2-95f2-ec6f0c7790dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Step 1/11] Parsing and validating input arguments\n",
      "[ INFO ] Parsing input parameters\n",
      "[Step 2/11] Loading OpenVINO Runtime\n",
      "[ INFO ] OpenVINO:\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] Device info:\n",
      "[ INFO ] NPU\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] \n",
      "[Step 3/11] Setting device configuration\n",
      "[Step 4/11] Reading model files\n",
      "[ INFO ] Loading model files\n",
      "[ INFO ] Read model took 11.00 ms\n",
      "[ INFO ] Original model I/O parameters:\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : f32 / [...] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 5/11] Resizing model to match image sizes and given batch\n",
      "[ INFO ] Model batch size: 1\n",
      "[Step 6/11] Configuring input of the model\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : u8 / [N,C,H,W] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 7/11] Loading the model to the device\n",
      "[ INFO ] Compile model took 2157.58 ms\n",
      "[Step 8/11] Querying optimal runtime parameters\n",
      "[ INFO ] Model:\n",
      "[ INFO ]   DEVICE_ID: \n",
      "[ INFO ]   ENABLE_CPU_PINNING: False\n",
      "[ INFO ]   EXECUTION_DEVICES: NPU.3720\n",
      "[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float16'>\n",
      "[ INFO ]   INTERNAL_SUPPORTED_PROPERTIES: {'CACHING_PROPERTIES': 'RO'}\n",
      "[ INFO ]   LOADED_FROM_CACHE: False\n",
      "[ INFO ]   NETWORK_NAME: \n",
      "[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1\n",
      "[ INFO ]   PERFORMANCE_HINT: PerformanceMode.LATENCY\n",
      "[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1\n",
      "[ INFO ]   PERF_COUNT: False\n",
      "[Step 9/11] Creating infer requests and preparing input tensors\n",
      "[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!\n",
      "[ INFO ] Fill input 'x' with random values \n",
      "[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)\n",
      "[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n",
      "[ INFO ] First inference took 7.94 ms\n",
      "[Step 11/11] Dumping statistics report\n",
      "[ INFO ] Execution Devices:NPU.3720\n",
      "[ INFO ] Count:            17894 iterations\n",
      "[ INFO ] Duration:         60004.76 ms\n",
      "[ INFO ] Latency:\n",
      "[ INFO ]    Median:        3.29 ms\n",
      "[ INFO ]    Average:       3.33 ms\n",
      "[ INFO ]    Min:           3.21 ms\n",
      "[ INFO ]    Max:           14.38 ms\n",
      "[ INFO ] Throughput:   298.21 FPS\n"
     ]
    }
   ],
   "source": [
    "!benchmark_app -m {model_path} -d NPU -hint latency"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "690798af-3c42-468b-ae71-f6c7e15907c7",
   "metadata": {},
   "source": [
    "As you can see from the log entries `[Step 4/11] Reading model files` and `[Step 7/11] Loading the model to the device`, it takes less time to read and compile the model after the initial load."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "511fa8a4-c267-4c7a-9651-42d82cea6386",
   "metadata": {},
   "source": [
    "#### NPU vs CPU with Throughput Hint\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "9d570a8b-2cdc-4d04-bf5c-35e9e4f558ee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Step 1/11] Parsing and validating input arguments\n",
      "[ INFO ] Parsing input parameters\n",
      "[Step 2/11] Loading OpenVINO Runtime\n",
      "[ INFO ] OpenVINO:\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] Device info:\n",
      "[ INFO ] CPU\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] \n",
      "[Step 3/11] Setting device configuration\n",
      "[Step 4/11] Reading model files\n",
      "[ INFO ] Loading model files\n",
      "[ INFO ] Read model took 12.00 ms\n",
      "[ INFO ] Original model I/O parameters:\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : f32 / [...] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 5/11] Resizing model to match image sizes and given batch\n",
      "[ INFO ] Model batch size: 1\n",
      "[Step 6/11] Configuring input of the model\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : u8 / [N,C,H,W] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 7/11] Loading the model to the device\n",
      "[ INFO ] Compile model took 177.18 ms\n",
      "[Step 8/11] Querying optimal runtime parameters\n",
      "[ INFO ] Model:\n",
      "[ INFO ]   NETWORK_NAME: Model2\n",
      "[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4\n",
      "[ INFO ]   NUM_STREAMS: 4\n",
      "[ INFO ]   AFFINITY: Affinity.HYBRID_AWARE\n",
      "[ INFO ]   INFERENCE_NUM_THREADS: 16\n",
      "[ INFO ]   PERF_COUNT: NO\n",
      "[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>\n",
      "[ INFO ]   PERFORMANCE_HINT: THROUGHPUT\n",
      "[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE\n",
      "[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0\n",
      "[ INFO ]   ENABLE_CPU_PINNING: False\n",
      "[ INFO ]   SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE\n",
      "[ INFO ]   MODEL_DISTRIBUTION_POLICY: set()\n",
      "[ INFO ]   ENABLE_HYPER_THREADING: True\n",
      "[ INFO ]   EXECUTION_DEVICES: ['CPU']\n",
      "[ INFO ]   CPU_DENORMALS_OPTIMIZATION: False\n",
      "[ INFO ]   LOG_LEVEL: Level.NO\n",
      "[ INFO ]   CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0\n",
      "[ INFO ]   DYNAMIC_QUANTIZATION_GROUP_SIZE: 0\n",
      "[ INFO ]   KV_CACHE_PRECISION: <Type: 'float16'>\n",
      "[Step 9/11] Creating infer requests and preparing input tensors\n",
      "[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!\n",
      "[ INFO ] Fill input 'x' with random values \n",
      "[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)\n",
      "[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n",
      "[ INFO ] First inference took 31.62 ms\n",
      "[Step 11/11] Dumping statistics report\n",
      "[ INFO ] Execution Devices:['CPU']\n",
      "[ INFO ] Count:            3212 iterations\n",
      "[ INFO ] Duration:         60082.26 ms\n",
      "[ INFO ] Latency:\n",
      "[ INFO ]    Median:        65.28 ms\n",
      "[ INFO ]    Average:       74.60 ms\n",
      "[ INFO ]    Min:           35.65 ms\n",
      "[ INFO ]    Max:           157.31 ms\n",
      "[ INFO ] Throughput:   53.46 FPS\n"
     ]
    }
   ],
   "source": [
    "!benchmark_app -m {model_path} -d CPU -hint throughput"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "670aa35c-3a7c-45de-807f-d1c09f2e9f9a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Step 1/11] Parsing and validating input arguments\n",
      "[ INFO ] Parsing input parameters\n",
      "[Step 2/11] Loading OpenVINO Runtime\n",
      "[ INFO ] OpenVINO:\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] Device info:\n",
      "[ INFO ] NPU\n",
      "[ INFO ] Build ................................. 2024.1.0-14992-621b025bef4\n",
      "[ INFO ] \n",
      "[ INFO ] \n",
      "[Step 3/11] Setting device configuration\n",
      "[Step 4/11] Reading model files\n",
      "[ INFO ] Loading model files\n",
      "[ INFO ] Read model took 11.50 ms\n",
      "[ INFO ] Original model I/O parameters:\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : f32 / [...] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 5/11] Resizing model to match image sizes and given batch\n",
      "[ INFO ] Model batch size: 1\n",
      "[Step 6/11] Configuring input of the model\n",
      "[ INFO ] Model inputs:\n",
      "[ INFO ]     x (node: x) : u8 / [N,C,H,W] / [1,3,224,224]\n",
      "[ INFO ] Model outputs:\n",
      "[ INFO ]     x.45 (node: aten::linear/Add) : f32 / [...] / [1,1000]\n",
      "[Step 7/11] Loading the model to the device\n",
      "[ INFO ] Compile model took 2265.07 ms\n",
      "[Step 8/11] Querying optimal runtime parameters\n",
      "[ INFO ] Model:\n",
      "[ INFO ]   DEVICE_ID: \n",
      "[ INFO ]   ENABLE_CPU_PINNING: False\n",
      "[ INFO ]   EXECUTION_DEVICES: NPU.3720\n",
      "[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float16'>\n",
      "[ INFO ]   INTERNAL_SUPPORTED_PROPERTIES: {'CACHING_PROPERTIES': 'RO'}\n",
      "[ INFO ]   LOADED_FROM_CACHE: False\n",
      "[ INFO ]   NETWORK_NAME: \n",
      "[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4\n",
      "[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT\n",
      "[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1\n",
      "[ INFO ]   PERF_COUNT: False\n",
      "[Step 9/11] Creating infer requests and preparing input tensors\n",
      "[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!\n",
      "[ INFO ] Fill input 'x' with random values \n",
      "[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)\n",
      "[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n",
      "[ INFO ] First inference took 7.95 ms\n",
      "[Step 11/11] Dumping statistics report\n",
      "[ INFO ] Execution Devices:NPU.3720\n",
      "[ INFO ] Count:            19080 iterations\n",
      "[ INFO ] Duration:         60024.79 ms\n",
      "[ INFO ] Latency:\n",
      "[ INFO ]    Median:        12.51 ms\n",
      "[ INFO ]    Average:       12.56 ms\n",
      "[ INFO ]    Min:           6.92 ms\n",
      "[ INFO ]    Max:           25.80 ms\n",
      "[ INFO ] Throughput:   317.87 FPS\n"
     ]
    }
   ],
   "source": [
    "!benchmark_app -m {model_path} -d NPU -hint throughput"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d00869a-1dad-4d5b-8df8-69cc07a36f98",
   "metadata": {},
   "source": [
    "## Limitations\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee1c5d72-5496-4150-84e6-92650ad9d2ab",
   "metadata": {},
   "source": [
    "1. Currently, only the models with static shapes are supported on NPU.\n",
    "2. If the path to the model file includes non-Unicode symbols, such as in Chinese, the model cannot be used for inference on NPU. It will return an error."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b72be49d-6be4-4594-9f9f-1fe1b1999e6e",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bddb4c03-b7b1-48d9-bbdd-ade8a1655e3a",
   "metadata": {},
   "source": [
    "This tutorial demonstrates how easy it is to use NPU in OpenVINO, check its properties, and even tailor the model performance through the different performance hints. \n",
    "\n",
    "Discover the power of Neural Processing Unit (NPU) with OpenVINO through these interactive Jupyter notebooks: \n",
    "##### Introduction\n",
    "- [**hello-world**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/hello-world): Start your OpenVINO journey by performing inference on an OpenVINO IR model.\n",
    "- [**hello-segmentation**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/hello-segmentation): Dive into inference with a segmentation model and explore image segmentation capabilities.\n",
    "\n",
    "##### Model Optimization and Conversion\n",
    "- [**tflite-to-openvino**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/tflite-to-openvino): Learn the process of converting TensorFlow Lite models to OpenVINO IR format.\n",
    "- [**yolov7-optimization**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/226-yolov7-optimization): Optimize the YOLOv7 model for enhanced performance in OpenVINO.\n",
    "- [**yolov8-optimization**](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/yolov8-optimization): Convert and optimize YOLOv8 models for efficient deployment with OpenVINO.\n",
    "\n",
    "##### Advanced Computer Vision Techniques\n",
    "- [**vision-background-removal**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/vision-background-removal): Implement advanced image segmentation and background manipulation with U^2-Net.\n",
    "- [**handwritten-ocr**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/handwritten-ocr): Apply optical character recognition to handwritten Chinese and Japanese text.\n",
    "- [**vehicle-detection-and-recognition**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/vehicle-detection-and-recognition): Use pre-trained models for vehicle detection and recognition in images.\n",
    "- [**vision-image-colorization**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/vision-image-colorization): Bring black and white images to life by adding color with neural networks.\n",
    "\n",
    "##### Real-Time Webcam Applications\n",
    "- [**tflite-selfie-segmentation**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/tflite-selfie-segmentation): Apply TensorFlow Lite models for selfie segmentation and background processing.\n",
    "- [**object-detection-webcam**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/object-detection-webcam): Experience real-time object detection using your webcam and OpenVINO.\n",
    "- [**pose-estimation-webcam**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/pose-estimation-webcam): Perform human pose estimation in real-time with webcam integration.\n",
    "- [**action-recognition-webcam**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/action-recognition-webcam): Recognize and classify human actions live with your webcam.\n",
    "- [**style-transfer-webcam**](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/style-transfer-webcam): Transform your webcam feed with artistic styles in real-time using pre-trained models.\n",
    "- [**3D-pose-estimation-webcam**](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/pose-estimation-webcam): Perform 3D multi-person pose estimation with OpenVINO.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "openvino_notebooks": {
   "imageUrl": "",
   "tags": {
    "categories": [
     "API Overview"
    ],
    "libraries": [],
    "other": [],
    "tasks": [
     "Image Classification"
    ]
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
